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1  Introduction 

With  large  amounts  of  social  data  sets  being  readily  available  through  web  based  social  media 
and  other  sources,  computational  solutions  are  required  for  effective  social  modeling  and  data 
analyses.  Social  network  is  a  powerful  paradigm  to  represent,  visualize,  interpret  and  analyze 
information.  Social  Network  Analysis  (SNA)  employs  graph  theoretic  methodologies  to 
mathematically  define,  analyze  and  quantify  relevant  metrics  in  social  networks  allowing  for 
their  interpretation  and  classification.  Research  in  SNA  has  led  to  the  emergence  of  several 
methodologies  that  have  come  to  be  widely  used.  Measures  such  as  centrality,  connectivity, 
degree  and  clique  sizes  have  become  standard  in  SNA. 

However,  large  network  sizes  and  dynamism  continue  to  be  important  issues  in  SNA.  As  more 
people  use  online  social  networking  apps  and  with  the  emergence  of  mobile  computing  apps, 
the  social  networks  that  have  to  be  processed  continue  to  grow.  Additionally,  real  time  social 
information  is  available  leading  to  issues  of  dynamism. 

Although  there  has  been  interesting  research  in  SNA,  it  has  been  scattered  and  narrowly 
applied.  There  is  a  need  for  an  overarching  framework  that  provides  a  common  representation 
for  the  different  methodologies,  making  it  easy  to  identify  their  similarities  and  differences.  This 
will  be  helpful  in  designing  new  algorithms  for  SNA  and  understanding,  a  priori,  their 
performance  and  utility.  The  framework  should  also  take  into  account  other  critical  aspects 
such  as  performance  evaluation  and  methodology  classification. 

In  pursuant  of  this  goal,  we  proposed  the  Social  Network  Analysis:  Classification,  Evaluation  and 
Methodology  (SNA-CEM)  framework.  SNA-CEM  consists  of  Methodology,  Evaluation  and 
Classification  components,  each  encapsulating  the  critical  aspects  of  the  framework. 
Methodology  component  deals  with  mathematically  representing  various  SNA  methodologies. 
Evaluation  component  consists  of  performance  techniques  and  metrics  to  measure  the  utility  of 
SNA  methodologies.  The  Classification  component  uses  the  measures  from  the  Evaluation 
component  and  representations  from  the  Methodology  component  to  group  methods  into 
categories,  based  on  their  similarity  with  respect  to  utility  and  performance.  In  our  project  we 
have  focused  on  methodologies  that  have  a  recursive  structure  and  can  be  partitioned  in  a 
parallel/distributed  environment.  The  utility  measures  used  for  evaluating  methodologies  are 
solution  quality  and  time  performance. 
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because  that  each  processor  only  focuses  on  a  small  problem  (the  local  subgraph)  while  the 
serial  algorithm  deals  with  the  original  large  network,  which  introduces  additional  overheads 
such  as  reading  discontinuous  memories. 
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Figure  1:  Ego-Betweenness  centrality  for  graph  with  average  degree=8  (Density  II)  [1] 

In  order  to  validate  the  anywhere  aspects  of  the  ego-betweenness  algorithm,  we  looked  at 
processing  time  for  adopting  64  random  changes  in  the  graphs.  Although  the  absolute  value  of 
time  cost  is  an  important  factor  to  evaluate  the  system  performance,  graph  size  and  density 
affect  time  cost.  It  may  be  more  useful  to  understand  algorithm  performance  with  respect  to 
graph  size  and  density.  Each  relative  cost  is  calculated  as:  p  =  max(r,  c)  /C,  where  p  is  the 
relative  cost,  r  is  the  time  cost  for  adopting  random  changes,  c  is  the  time  cost  for  adopting  max 
degree  changes,  C  is  the  time  cost  for  calculating  each  node's  ego-betweenness  centrality  for 
static  graphs  by  using  the  serial  algorithm.  From  the  experimental  results,  we  saw  that  the 
relative  cost  decreases  (even  though  the  absolute  time  cost  for  adopting  dynamic  changes 
actually  grows  as  graph  size  increases).  This  indicates  that  when  the  graph  size  becomes  larger 
and  larger,  the  portion  of  affected  obtained  results  becomes  smaller  and  smaller.  The  maximum 
relative  cost  for  adopting  one  edge  change  is  about  0.055%.  Using  theoretical  and  experimental 
analysis,  we  have  demonstrated  that  our  methodology  for  ego-betweenness  centrality 
measurement  can  efficiently  handle  graph's  dynamism. 

3.1.2  Maximum  Clique  Enumeration 

A  clique  of  a  graph  is  a  sub-graph  which  has  an  edge  between  every  pairs  of  nodes.  A  maximum 
clique  cannot  be  contained  in  another  clique.  Identifying  maximum  cliques  is  an  NP-hard 
problem.  Our  methodology  generates  the  maximal  cliques  by  coming  with  incrementally  large 
cliques.  This  has  a  natural  anytime  property  that  can  be  exploited  in  the  IA  phase.  We  also 
developed  anywhere  approaches  to  deal  with  two  types  of  dynamism:  edge  addition  and  edge 
removal. 


To  validate  our  methodology,  we  compare  its  performance  with  a  serial  algorithm  based  on 
Zhang's  Algorithm.  We  study  the  performance  for  graphs  from  size  5000  to  30000  with  various 
densities  and  no  maximal  cliques  with  size  larger  than  3.  For  brevity,  we  focus  on  the  analysis 
for  graphs  of  density  II  (average  degree  of  8),  with  the  experimental  results  in  Figure  2.  Figure  2 
shows  that  time  for  finding  all  maximal  cliques  increases  with  graph  size.  We  also  see  that  as 
the  number  of  processors  used  in  our  system  increases,  the  time  cost  for  solving  the  problem 
decreases.  Also,  our  parallel  approach  can  solve  the  maximal  clique  enumeration  problem 
faster  than  the  typical  serial  algorithm. 


Maximal  Clique  Enumeration  for  Graphs  with  Density  II 
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Figure  2:  Time  costs  for  enumerating  maximal  cliques  in  graphs  with  average  node  degree=8[4] 
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Figure  3:  Relative  cost  for  adopting  one  dynamic  edge  change  for  maximal  clique  S-Size  of  the  graph,  RC*-  Relative 

cost  for  graphs  with  average  degree  is  x  [4] 

We  tested  the  anywhere  approach  for  maximal  clique  enumeration  on  two  sets  of  8  dynamic 
edge  changes:  random  edge  changes  and  max  degree  edge  changes.  We  measured  the  time 
cost  of  our  system  to  handle  each  edge  change.  For  each  set,  we  take  the  average  value  as  the 
time  cost  for  adopting  one  dynamic  change.  Figure  3  shows  that  our  anywhere  approach  for 
maximal  clique  enumeration  can  effectively  handle  dynamic  graphs.  The  relative  cost  for 
adopting  a  random  edge  change  is  less  than  3.5%. 


3.2  Real  World  Scenarios 

In  accordance  with  our  goal  3,  we  have  applied  the  anytime-anywhere  algorithms  to  real  world 
scenarios.  This  was  done  as  part  of  our  transition  to  other  projects  dealing  with  socio-cultural 
behavioral  models.  The  Culturally  Infused  Social  Networks  (CISN)[5j  framework,  which  was 
developed  as  part  of  a  project  titled  "A  Framework  for  Adversarial  Social  Networks"  funded  by 
Defense  Threat  Reduction  Agency  (DTRA)[6j,  models  and  analyzes  complex  social  processes  by 


incorporating  fine  grained  socio-cultural  information  onto  individual  nodes  in  social  networks. 
We  deployed  anytime-anywhere  algorithms  for  measuring  closeness  centrality  in  modeling 
gang  violence  in  Haiti.  Gangs  in  Haiti  had  become  a  problem  as  they  operate  with  impunity  in 
many  areas.  They  gained  the  support  of  the  people  by  providing  basic  services  to  the  populace. 
Social  networks  representing  the  interactions  between  residents  of  a  town  in  Haiti,  and  cultural 
fragments  representing  their  ideology,  were  generated  to  model  the  scenario.  The  anytime- 
anywhere  methodology  provided  the  capability  to  extend  CISN  to  large  and  dynamic  graphs  in 
the  Haiti  scenario. 

3  Concluding  Remarks,  Future  Directions  and  Transition 

The  goals  and  objectives  of  this  project  were  met  and  discussed  in  the  previous  sections  of  this 
report. 

In  this  project,  we  have  not  only  validated  the  anytime-anywhere  methodology  but  also 
demonstrated  its  general  applicability  to  real  world  scenarios.  The  SNA  design  methods  from 
SNA-CEM  are  also  being  leveraged  to  model  behavior  of  populations  in  cross-border  epidemics 
[7],  as  part  of  a  project  funded  by  the  Department  of  Homeland  Security  (DHS).  Understanding 
cross-border  immigration  during  epidemics  will  help  border  security  and  border  health  agencies 
to  be  better  prepared. 

The  anytime-anywhere  framework  for  SNA  is  also  leveraged  to  model  and  analyze  social 
processes  in  network  centric  systems  in  the  context  of  Network  Centric  Operations/Network 
Centric  Warfare  (NCO/NCW),  as  part  of  a  project  funded  under  the  Army  High  Performance 
Computing  Research  Center  (AHPCRC)  initiative.  The  social  relations  between  human  actors  in 
the  network  are  key  factors  in  understanding  decision  making  and  situational  awareness  in 
NCO/NCW.  As  part  of  the  project,  parallel  and  distributed  SNA  algorithms  will  be  developed  to 
analyze  these  processes  in  very  large  and  dynamic  social  networks. 
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